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ABSTRACT 

This  report  contains  a  qualitative  discussion  of  sequential  decoding 
directed  toward  the  non-specialist  in  information  theory.  Both  the  Wozencraft 
and  Fano  algorithms  are  included.  The  description  of  sequential  decoding  is 
preceded  by  a  summary  of  the  general  problem' of  probabilistic  coding  and 
decoding. 
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Introduction 

In  1948  C.E.  Shannon  stated  his  so-called  coding  theorem, that  one  can 

communicate  at  rates  arbitrarily  close  to  a  maximum  called  the  channel 

capacity  with  arbitrarily  low  error  probabilities  provided  one  uses  suitably 

complex  signals  with  which  to  modulate  the  channel.  Finding  the  suitably 

complex  signals  or  codes  which  are  also  realizable  in  practical  equipment  has 

been  a  formidable  problem.  The  particular  coding  scheme  which  seems  most 

promising  in  making  Shannon's  theorem  practical  was  developed  by  J.  M. 

Wozencraft^  in  1958  and  is  called  sequential  decoding.  A  machine  (SECO^) 

realizing  these  principles  was  constructed  at  Lincoln  Laboratory  and  when 

used  on  a  telephone  line'  '  resulted  in  a  4-fold  increase  in  the  data  rate  over 

conventional  techniques  with  a  negligibly  small  error  probability.  A  variation 

(4)  (5) 

of  the  original  scheme  was  developed  by  R.  M.  Fano'  and  first  results' 
obtained  in  simulating  this  scheme  are  promising. 

In  this  report  we  discuss  the  general  principles  of  sequential  decoding 
at  a  level  which  it  is  hoped  will  be  understandable  to  the  non -specialist  in 
information  theory.  Both  the  Wozencraft  and  Fano  schemes  are  discussed 
and  an  effort  is  made  to  point  out  their  basic  similarities  and  their  differences. 

The  description  of  sequential  decoding  is  preceded  by  a  qualitative 
discussion  of  the  general  coding  problem. 


Discrete  Communications  Channels 

The  term  channel  is  a  very  overworked  one.  We  speak  of  the  gaussian 
channel,  telephone  channel,  H-F  channel,  satellite  channel,  binary  symmetric 
channel,  etc.  Some  of  these  words  are  descriptive  of  transmission  media, 
some  are  descriptive  of  noise  perturbations.  We  shall  begin  with  a  definition 
of  a  channel  in  the  information  theoretic  sense  and  use  the  word  only  in  this 
context  throughout  this  note. 

A  channel  is  described  by  first  determining  a  set  of  signals  or  waveforms 

which  the  transmitter  is  allowed  to  use.  We  can  label  each  of  the  waveforms 

with  a  symbol,  say  a.,  and  we  call  the  set  of  these  symbols  or  waveforms  the 

input  alphabet.  These  waveforms  are  transmitted  to  a  receiver  through  a 

communications  medium  which,  in  general,  distorts  the  signals.  The  receiver, 

in  turn,  operates  on  the  received  signal  and  thereby  generates  a  set  of  output 

symbols  b.  called  the  output  alphabet.  A  channel  is  completely  specified  if 
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°3  OUTPUT  ALPHABET 


Fig.  1.  A  discrete  channel. 


p(b0|o0)  *  p(b,|a,)  *  q 
p(b0|a,  )  =  P ( b, | o0)  =  p 


Fig.  2.  The  binary  symmetric  channel. 


Fig.  3.  An  approximation  to  the  binary  symmetric  channel. 
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the  distortion  introduced  in  the  transmission  can  be  completely  described,  in 

the  sense  that  if  symbol  a^  is  transmitted,  we  know  the  probability  of  receiving 

each  of  the  b.'s.  Thus  a  channel  can  be  represented  by  the  block  diagram  of 

Fig.  1  where  we  show  an  input  alphabet  A,  an  output  alphabet  B,  and  a  set 

of  arrows  from  each  a.  to  each  b.,  each  arrow  labelled  by  the  conditional 

,  1  3 

probability  p(b^  |  a^).  Thus  the  channel  includes  the  transmitter  and  receiver 
as  well  as  the  transmission  medium. 

The  above  definition  of  a  channel  is  strictly  true  only  for  a  so-called 
memoryless  channel,  i.  e.  ,  a  channel  in  which  it  is  meaningful  to  speak  of 
p(bj  |  a^)  without  reference  to  previously  transmitted  symbols.  In  this  note  we 
shall  consider  only  memorylesa  channels. 

The  binary -symmetric  channel,  shown  in  Fig.  2,  is  a  simple  example 
of  a  channel.  Here,  the  transmitter  can  send  one  of  two  signals,  a^  and 
aj  and  the  receiver  gives  one  of  two  answers  bp  and  bj.  The  channel 
transition  probabilities  are  symmetric  as  shown  in  the  figure.  Since  both 
alphabets  contain  the  same  number  of  signals,  there  is  no  loss  of  generality 
in  labelling  and  b^  by  the  symbol  0  and  a^  and  b^  by  the  symbol  1, 
and  this  is  usually  done.  It  will  be  noticed  that  this  description  of  the  binary- 
symmetric  channel  says  nothing  specific  about  the  physical  transmission 
medium  used. 

One  method  of  building  an  approximation  to  a  binary  symmetric  channel 
is  shown  in  Fig.  3.  The  transmitted  signals  a^  and  a^  are  pulses  of  one  or 
the  other  of  two  carrier  frequencies  in  the  H-F  region.  If  fading  due  to  iono¬ 
spheric  effects  is  neglected  the  received  signal  differs  from  the  transmitted 
signal  in  that  random  noise  is  added  at  the  front  end  of  the  receiver.  The 
latter  contains  two  filters  each  tuned  to  one  of  the  two  transmitted  frequencies. 
Let  Vq  and  v^  represent  the  output  voltages  of  the  two  filters.  The  receiver 
output  is  defined  to  be  symbol  bg  if  v^  >  v^,  and  b^  if  v^  >  v^.  Since 


P(bf 


al> 


p(b 
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a0)  = 


p  the  channel  is  binary-symmetric.  A  real  H-F 


system  of  this  kind  will  at  best  be  only  approximately  binary-symmetric  since 
the  ionospheric  behavior  will  cause  departures  of  the  operating  system  from 
the  ideal. 

In  Fig.  4  we  add  a  variation  to  the  channel  of  Fig.  3.  We  include  a 
third  symbol  b^  in  the  output  alphabet.  In  this  channel  b^  is  obtained  if 

v0  -  vi  >  T *  bj.  if  vi  “  vo  >  T;  and  b2’  if  |v0  ~  vi  I  <  T*  where  T 
is  some  positive  threshold  voltage.  Thus  this  channel  can  be  described  by 
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Fig.  4.  An  approximation  to  the  binary  erasure  channel. 


6AUSSIAN 

NOISE 


Fig.  5.  A  Gaussian  channel. 


Fig.  6.  Coding  for  the  continuous  channel. 
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p(bo  I  ai) 
P(^2  I  al^ 
P<bolao) 


p(blla0)  =  Pi 

p(b2la0)  =  p2 

p(b1  jaj)  =  1  -  Pj  -  p2 


If  Pj  =0,  then  the  channel  is  the  so-called  binary-erasure  channel,  which, 
though  at  most  an  approximation  to  some  real  channels,  has  some  convenient 
mathematical  properties. 


Continuous  Channels 

The  previous  section  considered  so-called  discrete  channels;  i.  e.  , 
channels  where  the  input  and  output  alphabets  are  of  finite  size.  Consider  the 
channel  shown  in  Fig.  5.  Here  the  input  alphabet  is  the  set  of  all  time  functions 
in  the  interval  0  <  t  <  T  with  average  power  <S.  The  channel  perturbation 
takes  the  form  of  white  gaussian  noise  with  power  density  Nq  watts  per  cycle. 
The  received  signal  is  obtained  by  adding  to  the  transmitted  signal  an  output 
sample  of  the  noise  generator.  This  is  the  so-called  white  gaussian  channel. 

If  the  signals  and  noise  are  restricted  to  a  bandwidth  W,  then  the  channel  is 
the  band -limited  gaussian  channel.  Clearly  a  gaussian  channel  can  be  converted 
into  a  binary-symmetric  channel  by  selecting  two  particular  time  functions  for 
the  input  alphabet  and  by  defining  the  output  alphabet  as  in  Fig.  3. 

The  channel  of  Fig.  5  is  an  example  of  a  continuous  channel.  It  is  useful 
for  our  purposes  not  because  it  is  practical  in  its  own  right  but  because  it  is  a 
limiting  case  against  which  we  can  compare  the  more  practical  discrete  channels. 
It  is  in  terms  of  this  continuous  channel  that  we  shall  introduce  the*  coding 
theorem  of  Shannon. 

k  RT 

We  code  for  this  continuous  channel  by  selecting  2  =2  input 

waveforms  of  length  T  and  labelling  these  waveforms  Xj,  .  .  .  X2k.  If 

symbol  X.  is  transmitted,  the  received  waveform  is  Y  =  X.  +  n,  where  n 
J  J 

is  a  noise  waveform.  The  receiver  thereupon  decodes  by  computing  the  numbers 

P(Xj|Y),  P(X2|Y),  .....  P(X2k  |  Y).  That  is ,  given  that  Y  was  received 

the  receiver  computes  the  probability  that  each  of  the  various  input  symbols 

was  transmitted,  based  upon  its  knowledge  of  the  channel  The  receiver  or 

decoder  output  is  defined  in  terms  of  these  probabilities:  the  receiver  output 

is  the  symbol  Y.  if  P(X.  |  Y)  is  the  largest  probability.  In  other  words,  the 
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receiver  computes  that  symbol  which  was  most  probably  transmitted  based 
upon  the  evidence  furnished  by  the  received  waveform  Y.  If  symbol  X.  was 
transmitted  and  symbol  Y.  is  received  where  i  4  j.  then  the  receiver  makes 
an  error. 

The  system  we  have  been  describing  is  shown  in  Fig.  6.  In  it,  a  data 
source  emits  R  bits  per  second.  These  data  bits  are  shifted  serially  into  a 

k  =  RT  cell  shift  register.  The  shift  register  is  filled  in  T  seconds.  Each 

k  k 

of  the  2  values  of  the  register  is  associated  with  one  of  the  2  input  symbols 

X.  of  the  code.  Hence  every  T  seconds  one  of  the  waveforms  X.  is  trans- 
J  J 

mitted  and  one  of  the  symbols  Y.  is  received.  Let  Pg(T)  be  the  probability 

of  error,  that  is,  the  probability  that  the  receiver  makes  an  incorrect  decision. 

Now  suppose  we  repeat  the  above  process  by  selecting  a  different  set 
of  2  waveforms  of  length  T  as  the  code.  We  obtain  in  this  way  another 
probability  of  error.  Suppose  that  we  make  all  possible  selections  of  2 
waveforms  and  compute  the  probability  of  error  for  each  code. 

Let  us  hold  R  fixed  and  repeat  this  process  for  a  larger  value  of  T 
or  equivalently  a  larger  value  of  k  =  RT.  Shannon's  coding  theorem  states 
that  there  exist  codes  (selections  of  waveforms)  for  which  Pg(T)  tends  to  0 
as  T  becomes  large,  provided  that  the  rate  R  is  less  then  some  maximum 
value,  called  the  channel  capacity  C.  If  R  >  C  then  there  is  no  way  of 
making  Pe(T)  tend  to  0  for  large  T.  This  behavior  of  the  error  probability 
can  be  stated  more  strongly:  there  exist  codes  of  length  T  seconds  where  the 
probability  of  error  Pg  behaves  according  to 

P  .  2"TE!(R)  ,  R  <  C 
e 

where  the  exponent  E^(R)  is  a  function  having  the  general  behavior  shown  in 
Fig.  7.  Here  E^R)  is  plotted  against  the  rate  R.  It  has  some  positive  value 
at  R  =  0  and  decreases  to  0  when  R  =  C,  the  channel  capacity.  Since  the 
probability  of  error  depends  upon  the  product  E^T,  there  is  a  choice  as  to 
how  to  obtain  a  given  P  .  If  one  chooses  to  operate  at  very  low  rates  with 
respect  to  channel  capacity,  then  E^  is  large  and  T  is  correspondingly  small. 
For  operation  at  rates  which  are  a  large  fraction  of  capacity,  E^  is  small  and 
the  necessary  T  becomes  correspondingly  large.  Thus  one  can  obtain  an 
arbitrarily  small  Pfi  for  a  given  R  <  C,  provided  one  is  willing  to  make  T 
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RT 

large  enough.  But  this  implies  generating  2  waveforms  at  the  transmitter 

RT  RT 

and  computing  the  largest  of  2  numbers  at  the  receiver.  In  general  2 

is  too  large  for  practical  exploitation  of  the  continuous  channel. 

Channels  with  Discrete  Input  and  Continuous  Output 

The  first  step  in  making  practical  use  of  the  coding  theorem  is  to  limit 

the  size  of  the  channel  input  alphabet  to  some  reasonable  number,  say 

m  =  2^,  where  £  is  generally  less  than  5  or  6.  In  the  simplest  case 

l  =  1  and  we  have  a  binary  input  alphabet.  Suppose  these  binary  symbols 

are  each  7  seconds  long.  Then  in  time  T  we  can  transmit  n  =  T/7  binary 

channel  symbols  or  a  total  of  2n  different  sequences  of  channel  symbols  in 

k  RT  n 

time  T.  To  code  for  the  channel  we  select  2=2  of  these  2  sequences. 

Such  a  code  is  often  called  an  (n,  k)  block  code.  This  is  to  be  contrasted  with 
k 

the  selection  of  2  of  the  infinite  number  of  different  waveforms  of  length  T 
in  the  continuous  channel. 

More  generally,  if  the  input  alphabet  size  is  2^  and  each  waveform  is 
of  length  7,  then  there  are  2^n  possible  sequences  of  length  n  =  T/7. 
Obviously,  k  <  £n  or  k/^n  <  1.  The  fraction  k/^n,  the  ratio  of  the  number 
of  information  bits  per  block  to  the  number  of  equivalent  binary  symbols  per 
block,  is  often  called  the  rate  r  in  bits  per  transmitted  binary  symbol. 

When  the  input  alphabet  of  the  channel  is  restricted  in  this  way  the 
picture  of  the  transmitting  terminal  presented  in  Fig.  6  for  the  continuous 

lr 

channel  changes.  In  the  continuous  case  the  channel  alphabet  consisted  of  2 
waveforms  one  of  which  was  selected  for  each  set  of  k  bits  received  from  the 
source.  For  the  finite  input  alphabet  (binary  case)  we  have  the  picture  shown 
in  Fig.  8.  For  every  set  of  k  bits  received  from  the  source,  n  bits  are 
generated.  Generally  one  can  think  of  the  n  bits  as  being  composed  of  the  k 
information  bits  to  which  are  appended  n  —  k  parity  check  digits  which  are 
linear  functions  of  the  k  information  bits.  These  n  digits  are  taken  serially 
to  select  one  of  the  two  waveforms  forming  the  channel  input  alphabet.  In  the 
continuous  case  the  encoder  is  simply  a  device  for  selecting  one  of  2  wave¬ 
forms.  In  the  discrete  input  alphabet  case,  the  encoder  first  performs  a 
transformation  on  the  k  input  digits  and  then  makes  n  waveform  selections 
in  sequence. 
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EXPONENT  E 


Fig.  7.  Error  exponents. 


Fig.  8.  A  binary  encoder. 


Fig.  9.  Decoder  for  binary  input- continuous  output  channel. 
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In  the  continuous  channel,  (Fig.  6)  the  receiver  computed  P(X^  |  Y) 
for  each  of  the  waveforms  X..  In  the  discrete  input  case  each  X^  is  now  a 
sequence  of  n  elementary  channel  waveforms.  Under  the  assumption  that 
the  channel  is  memoryless;,  P(X.  |  Y)  can  be  expressed  as  the  product  of  the 
corresponding  probabilities  of  the  symbols  making  up  the  sequence,  i.  e.  , 

P(X^  |  Y)  is  of  the  form  p(x^|y^)  P(x£  ^3)  •  •  •  P(xn|yn)>  where  the  trans¬ 
mitted  message  is  the  sequence  x^, x^  .  .  .  x^  and  the  received  message  is 

the  sequence  y. ,  y,,  ....  y  .  Each  of  the  elementary  signals  x.  has  one  of 
1  |  M  IX  X 

two  (in  general  2^)  forms.  However  each  of  the  y.  is  still  obtained  by 
adding  a  noise  sample  to  the  transmitted  waveform.  Thus  y^  is  still  an 
arbitrary  waveform  of  length  j. 

The  receiver  (Fig.  9)  makes  its  decision  on  a  block  basis.  It  receives 

n  waveforms  y^,  y^,  ....  y  .  The  channel  makes  a  tentative  decision  on 

each  received  symbol.  A  decoder  which  follows  the  channel  makes  a  final 

decision  on  the  block  of  n  channel  decisions.  More  precisely,  when  the  first 

symbol  y^  of  the  block  is  received,  the  channel  computes  p(a^  |  y^)  and 

P(a  1  |Yj)>  i-  e.  ,  the  probabilities  that  the  first  transmitted  symbol  was  a^  and 

respectively.  These  two  probabilities  are  stored  in  the  decoder.  The  same 

computation  is  made  for  each  of  the  n  symbols  of  the  block.  The  decoder,  of 

k  n 

course,  knows  the  code,  i.  e.  ,  which  2  of  the  2  possible  sequences  were 
selected.  It  computes  the  probabilities  of  these  sequences  by  taking  the 
appropriate  products  of  the  symbol  probabilities  and  gives  as  its  final  output 
the  sequence  with  highest  probability.  Suppose,  for  example,  that  k  =  2  and 
n  =  3  or  equivalently  that  4  of  the  8  possible  sequences  of  3 -digit  binary 
symbols  may  be  transmitted.  Let  these  4  sequences  be 


X1  = 

0  0  0 

X 

Ni  1 

i! 

0  0  1 

X3  ' 

0  1  0 

X4  * 

1  0  0 

If  the  received  sequence  Y  =  y^,  y.,,  y^,  then  the  decoder  computes 
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p(Xj  I  Y)  =  p(a0lyj)  p(a0ly2)  p(aQly3) 

P(X2I  Y)  =  p^ly^  p(aQly2)  pfcjly^ 

p(X3lY)  =  ptajjly^  p(a1ly2)  p(a0ly3) 

p(X4lY)  =  p(a1ly1)  p(aQly2)  p(a0ly3) 

and  decodes  the  sequence  with  highest  probability. 

Another  way  of  describing  the  same  computation  is  in  terms  of  a 
so-called  "distance"  or  "metric".  Clearly  it  makes  no  difference  whether  the 
receiver  computes  p(X J  Y)  or  some  monotone  function  of  p(X.  I  Y)  say 
D(X^,  Y)  =  —  log  p{Xj  Y).  Maximizing  p(X.  I  Y)  is  equivalent  to  minimizing 
the  distance  D(X.,  Y)  [if  p(X.lY)  =  1,  then  D(X.,  Y)  =  0  ].  Since 
p(XJY)  =  ptx^yj)  p(x2(y2),...p(xnlyn),  it  follows  that 

D(X.,Y)  =  -logp(X.lY)  =  -logpfx^yj)  -log  p(x ^  y2),  .  .  . -log  pfxj  yn) 
=  d(Xj ,  yx)  +  d(x2,  y2)  +  ...+  d(xn>  yn). 

Thus  the  "distance"  between  two  sequences  X  and  Y  is  the  sum  of  the 
"distances"  between  the  corresponding  symbols  making  up  the  sequences. 


Probability  of  Error 

What  can  be  said  about  the  behavior  of  the  probability  of  error  for  this 
channel  with  discrete  input  alphabet?  The  essential  difference  between  this 

k 

channel  and  the  continuous  channel  is  the  fact  that  2  waveforms  are  selected 
out  of  2n  possible  waveforms  of  length  T  rather  than  out  of  the  set  of  all 
functions  of  length  T.  One  certainly  cannot  expect  an  improvement  in 
performance  by  restricting  the  class  of  possible  waveforms.  It  turns  out  that 
the  general  form  of  Shannon's  theorem  still  holds,  that  is 

p  _  2-te2(R) 


where  the  new  exponent  E2  is  everywhere  smaller  than  the  exponent  E^  of 
the  continuous  case.  The  more  the  set  of  waveforms  is  restricted,  the  smaller 
the  value  of  the  exponent  for  each  value  of  R.  This  is  shown  in  Fig.  7. 


Output  Quantization 


In  the  channel  with  discrete  input  the  decoder  stores  2n  probabilities 

or  "distances"  obtained  from  the  channel  output  and  makes  2  computations 

k 

involving  these  2n  numbers.  Not  only  is  2  a  frightening  number 
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of  computations  for  useful  values  of  k  ,  but  in  addition  storage  for  2n 
distances  or  probabilities  may  also  be  unattractive.  One  way  of  reducing  the 
storage  requirements  is  by  "quantizing"  the  channel  output  probabilities.  The 
channel  with  which  we  have  been  dealing  in  this  section  has  a  binary  input 
alphabet  but  an  output  alphabet  consisting  of  2  continuously  valuable  probabilities 
or  distances.  If  we  use, say,  10  bits  to  specify  each  probability  or  distance 
(.  1%  accuracy)  then  20  bits  of  storage  are  required  to  store  a  single  channel 
output.  If  this  represents  too  much  storage  for  the  decoder,  then  the  distances 
may  be  stored  with  fewer  bits  and  therefore  with  less  precision.  The  binary 
symmetric  channel  represents  an  extreme  case.  Here  we  retain  just  one  bit 
at  the  output  which  tells  whether  pfagly)  is  greater  or  smaller  than  p(a^ly). 

^  P(aQly)  <  p{ajly)  the  distance  d(ag,  y)  =  0  and  d(ap  y)  =  1.  In  this 
case  of  maximum  quantization  the  channel  is  said  to  make  a  "hard  decision"; 
it  calls  the  received  symbol  a^  or  a^  according  to  which  decision  is  more 
probable.  The  distance  defined  in  this  case  is  the  so-called  Hamming 
distance. 

It  seems  reasonable  that  the  less  information  the  decoder  retains  on 
which  to  base  its  final  decision,  the  poorer  will  be  its  performance.  Again, 
even  with  channel  output  quantization,  Shannon's  theorem  still  takes  the  same 
form 

p  _  2-TE3(R) 
e 

where  (shown  in  Fig.  7)  is  smaller  than  E^  ,  how  much  smaller  depending 

upon  how  much  information  is  discarded  by  the  channel. 

Central  Problems  of  Coding  Theory 

What  we  have  attempted  to  show  to  this  point  is  that  given  a 
communications  medium  and  a  continuous  channel  using  that  medium,  it  is 
possible  to  use  the  channel  reliably  and  efficiently  (Shannon's  theorem  for 
continuous  channels).  To  obtain  the  kind  of  performance  that  we  want, 
continuous  channels  are  impractical.  We  convert  the  continuous  channels  to 
discrete  channels  by  input  quantization  (limiting  the  number  of  possible  input 
symbols)  and  by  output  quantization  (limiting  the  amount  of  information  retained 
by  the  decoder  per  symbol).  Both  forms  of  quantization  simplify  the  encoding 
and  decoding  equipments  at  the  cost  of  performance. 

One  of  the  central  problems  in  communication  theory  is  finding  discrete 
channels  and  codes  which  are  both  practical  and  which  do  not  degrade  the 
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performance  excessively  from  the  continuous  channel.  By  finding  channels  and 
codes  we  mean  finding  input  and  output  quantizations  and  then  making  "good" 
selections  of  sequences  of  channel  symbols  so  that  the  resulting  error  exponent 
E^(R)  does  not  depart  excessively  from  the  optimum  exponent  E^(R). 

The  other  problem  and  the  one  to  which  we  devote  the  rem  ainder  of 

this  note,  is  the  problem  of  decoding.  Regardless  of  the  channel  quantization, 

k 

the  method  of  decoding  described  up  until  now  has  required  2  comparisons 
at  the  decoder  leading  to  the  selection  of  the  most  probable  sequence.  As  long 
as  this  "maximum  likelihood"  method  of  decoding  is  retained,  this  number  of 
computations  is  fundamental.  Another  method  of  decoding  has  been  found  by 
Wozencraft  which  yields  the  same  error  exponent  as  maximum  likelihood 
decoding  but  far  less  computation.  This  technique  is  called  sequential  decoding 
and  its  properties  are  described  in  the  rest  of  this  note. 

Sequential  or  Convolutional  Encoding 

The  essence  of  sequential  decoding  is  the  replacement  of  the  "jumping" 
constraint  of  block  coding  by  a  "sliding"  constraint.  In  an  (n,  k)  block  code, 
a  block  of  k  information  bits  is  mapped  into  a  block  of  n  binary  symbols  by 
the  generation  of  n  —  k  parity  check  symbols  which  depend  upon  the  k 
information  bits  ,  (Fig.  8).  The  next  block  in  sequence  depends  upon  the  next 
k  information  bits  and  thus  is  completely  independent  of  all  previous  blocks. 

In  a  sequential  code  with  the  same  parameters,  check  symbols  are  interspersed 
between  successive  information  digits  with  each  check  symbol  dependent  upon 
the  previous  k  information  bits. 

We  show  this  in  Fig.  10  for  k  =  4,  n  =  8,  or  r  =  l/2.  On  line  (a)  we 
show  an  infinite  stream  of  information  bits  rm  as  emitted  by  an  information 
source.  On  lines  (b)  and  (c)  we  show  the  dependencies  of  the  transmitted 
symbols  by  brackets.  Line  (b)  represents  a  block  coding  scheme:  message 
bits  m^  —  m^  are  transmitted  together  with  check  digits  c^  —  c^  to  form 
an  8  digit  block.  The  four  parity  check  digits  depend  only  on  m^  —  m^.  The 
four  check  digits  cg  —  cj  ^  in  the  next  block  depend  only  on  the  message  digits 
nrig  —  m^  of  that  block.  (Clearly,  with  the  dependency  specified  in  this  way 
it  makes  no  difference  whether  the  check  digits  are  uniformly  distributed  in 
the  block,  clumped  at  one  end,  or  arranged  in  any  other  way).  On  line  (c)  the 
sequential  configuration  is  shown.  Check  digit  c^  is  dependent  upon  m^  —  m^, 
Cg  on  m^  -m^,  c^  on  m^  —  m^,  etc. 
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It  is  convenient  to  think  of  the  sequential  encoding  process  in  terms  of 
a  coding  "tree".  To  see  how  this  comes  about  we  refer  to  Fig.  lOd  and  begin 
by  assuming  that  we  have  just  taken  m^  from  the  source  and  generated  check 
digit  c^.  We  have  two  choices  for  m^  ,  0  or  1.  Once  m^  has  been  selected, 
the  check  digit  c^  is  completely  determined  by  m^  and  the  previous  message 
bits  m^  ,  m^  and  m^  .  This  is  represented  in  Fig.  lOd  as  follows. 

Following  m^,  c^  is  a  branching  corresponding  to  the  two  possible  choices 
for  m^.  The  check  symbol  c^  follows  mg  on  each  of  the  two  branches.  The 
arrival  of  causes  another  branching  with  the  indicated  dependence  of  c^ 

on  the  four  message  bits  m^,  m^,  mg  and  m^.  This  process  continues 
indefinitely. 

Thus  the  sequential  encoding  process  transforms  an  infinite  sequence 
of  information  bits  into  an  infinitely  long  tree  with  each  branching 
corresponding  to  the  two  possible  values  of  each  information  bit.  The  check  . 
digits  appearing  next  to  the  information  bit  defining  a  branch  are  functions  of 
that  information  bit  and  the  k— 1  preceding  information  bits.  Figure  lOd  shows 
a  tree  for  r  =  l/2.  If  r  =  l/3  then  each  message  bit  is  followed  by  two 
check  digits  on  each  branch  with  each  check  digit  a  diffe rent  function  of  the 
same  k  variables.  In  all  cases,  a  particular  input  information  sequence  is 
transformed  into  a  particular  path  through  the  tree. 

A  sequential  code  of  this  type  is  easily  generated  with  a  circuit  similar 
to  that  of  Fig.  lOe.  Message  bits  are  shifted  sequentially  through  a  k  =  4 
storage  cell  register  from  right  to  left.  A  parity  network  connected  to  the 
register  generates  the  check  digits.  In  Fig.  lOe,  digits  m^-m^  are 
contained  in  the  register,  hence  the  parity  network  generates  c^.  The  output 
sequence  is  formed  by  switching  alternately  to  the  latest  message  digit  in  the 
rightmost  cell  of  the  register  and  to  the  parity  output,  the  switching  occurring 
at  twice  the  input  data  rate,  (r  =  1/2).  Since  the  output  sequence  is  obtained 
by  shifting  a  message  sequence  past  a  fixed  parity  network,  the  encoder  is 
referred  to  as  a  convolutional  encoder. 

The  process  is  begun  by  assuming  an  initial  sequence  of  k—  1,  0's 
preceding  the  message  sequence.  Thus  the  first  check  symbol  c^  is 
Cj(0,  0,  0,  mj).  In  terms  of  the  circuit  of  Fig.  11  this  is  accomplished  by 
clearing  the  shift  register  before  shifting  in  the  first  message  digit. 

All  the  above  remarks  and  those  which  follow  while  stated  for  binary 
codes  apply  equally  well  to  other  alphabets  and  tree  structures. 
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Some  Properties  of  Trees 

It  is  clear  from  the  previous  section  that  once  parameters  k  and  n 
(or  k  and  r)  have  been  established  and  the  parity  generating  functions  have 
been  selected,  the  code  tree  is  completely  determined.  Having  fixed  k  and 
r,  the  problem  of  finding  "good"  codes  is  the  problem  of  finding  "good"  parity 
networks.  The  "goodness"  of  a  tree,  or  of  the  parity  networks  generating  the 
tree,  is  determined  by  how  "unlike"  the  paths  through  the  tree  are  with  respect 
to  a  decoding  "distance",  which  as  shown  above  is  related  to  the  channel 
conditional  probabilities. 

The  tree  repeats  every  k  branches.  This  is  seen  in  Fig.  lOd.  At  m^, 
the  parity  symbol  c^  depends  on  m^,  m^,  mg  and  m^.  It  is  therefore 
independent  of  the  branching  at  m^  .  Hence,  if  we  compare  two  infinite 
message  sequences  s^  and  s^  which  are  identical  except  for  one  bit,  say 
m^,  the  transmitted  sequences  (or  paths)  x^  and  x^  corresponding  to  s^ 
and  s^  are  identical  except  for  the  branches  from  m^  through  mg  .  If  the 
two  sequences  s^  and  s^  differ  in  two  bits  m^  and  mg  ,  the  branches  of 
Xj  and  x^  do  not  become  identical  until  m^.  *n  general  if  two  message 
sequences  differ  in  some  digit,  the  corresponding  paths  do  not  agree  for  at 
least  k  —  1  additional  branches. 

If  Sj  and  s^  differ  only  at  ,  then  the  distance  between  x^  and 

x^  increases  until  m^  is  reached.  Thereupon  the  distance  remains  constant. 

Sequential  Decoding  -  General  Principles 

As  we  have  pointed  out  above,  the  sequential  encoding  process  differs 
from  block  encoding  in  one  major  respect:  the  jumping  constraint  of  block 
encoding  is  replaced  by  the  sliding  constraint  of  sequential  or  convolutional 
encoding.  The  result  of  the  sequential  encoding  process  is  the  transformation 
of  an  infinite  information  sequence  into  an  infinitely  long  path  through  a  tree. 

The  sequential  decoding  process  is  therefore  the  reconstruction  of  the  path 
corresponding  to  the  transmitted  sequence  based  upon  the  information  contained 
in  the  received  sequence.  The  criterion  by  which  a  sequential  decoder  selects 
the  correct  path  differs  from  the  criterion  by  which  the  block  decoder  selects 
the  correct  sequence  in  a  fundamental  way.  The  block  decoder  uses  the 
criterion  of  maximum  likelihood  or  some  variation  thereof.  Sequential  decoding 
imposes  a  threshold  based  upon  an  assumption  about  the  noise  characteristics. 
Thus,  whereas  block  decoding  looks  for  the  most  probable  sequence,  sequential 
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decoding  looks  for  a  sequence  which  is  "sufficiently  probable"  with  respect  to 
some  threshold. 

It  is  these  two  fundamental  differences  between  sequential  and  block 
operation  that  provide  the  basis  for  decoding  or  tree  search  algorithms  which 
drastically  reduce  the  number  of  decoding  operations  below  the  block  code 
requirements  while  retaining  the  same  error  probability  behavior  as  in  the 
block  code.  The  undesirable  feature  that  results  from  these  same  properties 
is  the  catastrophic  nature  of  decoding  errors.  That  is,  once  an  incorrect 
branch  is  chosen  (without  the  opportunity  to  change  the  decision)  the  probability 
of  the  decoder  correcting  itself  (finding  the  right  path)  is  low. 

Given  the  fundamental  properties  of  sequential  operation,  i.  e.  coding 
trees  and  threshold  decoding,  how  does  a  decoder  operate?  The  basic 
physical  law  which  governs  this  operation  is  the  probabilistic  law  of  large 
numbers  which  says  grossly  that  the  more  observations  made  of  a  random 
process  the  more  reliable  are  the  inferences  that  can  be  made  about  the  process. 
Every  received  channel  symbol  provides  a  sample  of  the  channel  noise.  The 
longer  the  sequence  of  symbols  that  the  decoder  observes  the  more  samples 
of  channel  noise  are  being  observed  and  the  more  "typical"  the  noise  should 
look.  A  short  sequence  of  symbols  may  exhibit  atypical  noise;  the  longer  the 
sequence  the  less  probable  it  is  that  the  noise  looks  atypical.  Suppose  that  the 
decoder  is  on  the  correct  path.  An  atypical  noise  event  can  make  the  correct 
path  look  "too  improbable"  for  a  short  time.  The  longer  the  observation  of  the 
correct  path  the  more  likely  it  is  to  exhibit  predicted  behavior.  On  the  other 
hand,  when  the  decoder  takes  an  incorrect  path,  in  the  absence  of  channel 
noise,  the  longer  the  observation  the  less  probable  the  path  appears.  Again 
an  atypical  noise  event  can  make  an  incorrect  path  appear  more  probable  than 
the  correct  path.  However,  the  longer  the  observed  sequence  the  less  probable 
atypical  noise  behavior  becomes  and  the  smaller  the  probability  that  an 
incorrect  path  looks  "sufficiently  probable".  It  also  follows  that  the  greater  the 
distance  that  a  decoder  proceeds  along  an  incorrect  path  the  more  computations 
will  be  required  to  rectify  the  error.  Thus  the  probability  of  remaining  on  an 
incorrect  path  for  a  given  path  length  decreases  with  the  length  and  the  number 
of  operations  to  rectify  the  error  increases  with  length.  It  turns  out  that  one 
may  find  tree  search  schemes  in  which  the  average  number  of  computations  is 
low  or  equivalently  schemes  in  which  the  probability  of  long  incorrect  paths  is 
sufficiently  small. 
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Sequential  Decoding  Algorithms 

As  the  decoder  proceeds  to  reconstruct  a  path  through  a  tree  based 
upon  a  received  sequence,  it  compares  the  probability  of  the  path  that  it  is 
currently  exploring  against  a  threshold  determined  by  the  current  noise  level 
that  the  decoder  expects.  If  the  comparison  is  favorable  (the  path  looks 
"sufficiently  probable"),  the  decoder  continues  forward  in  the  tree.  If  the 
comparison  is  unfavorable  then  either  the  decoder  is  on  the  wrong  path  because 
of  an  atypical  noise  event  that  occurred  sometime  previously  or  the  decoder  is 
on  the  correct  path  and  an  atypical  noise  event  is  now  occurring. 

At  this  point  the  decoder  makes  the  assumption  that  the  path  is  incorrect. 
The  decoder  reverses  itself  and  searches  back  in  the  attempt  to  find  a  more 
probable  path.  The  distance  it  is  allowed  to  go  back  in  this  search  depends 
upon  the  particular  algorithm.  For  the  moment  assume  that  the  decoder 
searches  back  until  it  either  finds  a  "sufficiently  probable"  path  or  it  retreats 
some  fixed  number  d^  nodes  back  without  finding  a  good  path.  The  former 
case  implies  that  the  first  path  was  indeed  incorrect  as  hypothesized  and  that 
the  new  path  is  more  likely  correct.  The  decoder  thereupon  proceeds  forward. 
The  latter  case  implies  that  the  first  path  may  still  be  correct  and  only  appears 
improbable  because  of  atypical  noise  behavior.  The  decoder  relaxes  its 
criterion  of  "sufficiently  probable"  and  proceeds  as  before  with  a  new  thres¬ 
hold  based  upon  the  current  assumption  of  the  noise  level. 

The  Wozencraft  or  SECO  Algorithm 

The  remarks  of  the  previous  section  are  quite  general  and  apply  to 
any  sequential  decoding  algorithm.  We  must  now  become  more  specific  and 
pin  down  some  of  the  parameters  that  define  a  decoding  algorithm.  In  this 
section  we  consider  the  SECO  algorithm  which  is  a  variation  of  the  original 
Wozencraft  scheme.  In  the  next  section  we  shall  discuss  the  Fano  decoding 
algorithm. 

Three  properties  of  a  decoding  algorithm  which  were  left  either  vague 
or  unmentioned  in  the  previous  section  must  be  specified;  first,  the  definition 
of  "sufficiently  probable"  as  applied  to  the  decoding  threshold;  second,  the 
number  of  nodes  the  decoder  is  allowed  to  search  back  m  the  attempt  to  find 
a  good  path  before  relaxing  the  threshold,  and  third,  the  criteria  used  to  make 
final  or  irrevocable  decisions  about  branching  nodes,  and  the  criteria  for 
releasing  the  decoded  digits  to  the  customer. 
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In  the  SECO  algorithm  all  three  of  these  properties  are  tied  to  the 
constraint  length  k.  Referring  again  to  the  tree  of  Fig.  10  d,  suppose  that  the 
path  through  is  correct  and  that  the  decoder  is  about  to  make  a  decision 

on  mg.  .  We  divide  the  tree  into  two  subsets  of  equal  size  stemming  from  the 
two  branches  at  m^.  We  call  the  subset  containing  the  correct  path  the 
correct  subset  and  we  call  the  subset  stemming  from  the  incorrect  branch  at 
nig  the  incorrect  subset  .  The  basic  operation  of  the  SECO  procedure  is  to 
distinguish  between  the  two  subsets  and  thereby  decode  m^.  To  do  this  we 
fix  a  small  fraction,  and  then  define  a  distance  function  Kj(£)  (Fig.  11) 

as  follows:  the  probability  that  any  sequence  £  branches  long  in  the  incorrect 
subset  will  be  closer  to  the  corresponding  subsequence  of  the  received  message 
than  Kj  (£)  is  less  than  P^.  For  a  larger  probability  P^  we  obtain  a  function 
K2  (£)  which  is  larger  than  Kj  (£)  for  all  values  of  £. 

The  functions  K.  (£),  0  <  £  <  k  ,  are  then  used  as  the  criteria  of  the 

J 

previous  section.  If  a  path  of  length  £  is  found  closer  to  the  received  sequence 
than  Kj  (£)  ,  this  path  is  defined  to  be  "sufficiently  good"  since  the  probability 
that  this  path  is  in  the  incorrect  subset  is  less  than  P^.  When  a  path  of  length 
k  is  found  to  be  "sufficiently  good"  then  the  message  bit  defining  the  beginning 
of  that  path  is  decoded.  If  a  path  of  length  k  is  not  good  enough,  then  the 
decoder  goes  backward  and  then  forward  taking  alternate  branches  in  a 
systematic  search  for  a  sequence  of  length  k  that  meets  the  criterion.  If  the 
decoder  has  examined  all  branches  back  to  that  of  the  digit  being  decoded  and 
has  been  unsuccessful,  then  the  assumption  is  made  that  the  noise  is  atypical, 
the  decoder  returns  to  its  old  path,  relaxes  the  criterion  from  to  and 

begins  again.  In  subsequent  decodings  the  criterion  is  lowered  whenever 
possible.  The. criteria  for  releasing  the  decoded  digits  to  the  user  are  discussed 
later. 

The  Fano  Decoding  Algorithm 

In  the  SECO  algorithm,  the  final  decoding  of  a  digit  and  the  tree  search 
are  intimately  related.  A  digit  is  decoded  whenever  a  good  path  k  branches 
long  stemming  from  that  digit  is  found  since  that  path  belongs  to  the  correct 
subset  with  very  high  probability.  Fano's  algorithm  divorces  the  decoding 
from  the  tree  search.  His  decoder  always  attempts  to  find  the  most  probable 
path  through  the  tree.  When  a  digit  is  decoded  it  is  determined  by  other 
conditions  discussed  later. 
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Fig.  12.  Fano  decoder  criterion  functions. 


The  criterion  functions  used  in  Fano  decoding  are  shown  in  Fig.  12. 

These  consist  of  a  number  of  parallel  lines  of  slope  dQ  and  with  adjacent 
intercepts  differing  by  Tq  .  These  curves  are,  of  course,  qualitatively 
similar  to  those  of  Fig.  11.  Unlike  SECO  the  abscissa  t  designating  path 
length  is  not  limited  by  the  constraint  length  but  extends  indefinitely.  Plotted 
on  the  same  curve  is  the  distance  of  the  current  path  from  the  received  sequence. 
The  current  threshold  is  by  definition  the  one  immediately  above  the  distance 
curve.  From  the  origin  to  point  a  the  decoder  proceeds  forward  taking  the 
better  branch  at  each  node  and  the  threshold  is  Tj  .  At  point  a  the  distance 
is  increasing  slowly  enough  (the  noise  level  is  low  enough)  for  the  distance  to 
cross  curve  T ^  ;  it  continues  past  point  £  taking  the  lower  (more  probable) 
path  intil  the  distance  starts  increasing  rapidly  and  at  point  b  crosses  the 
threshold  curve.  At  this  point  the  decoder  assumes  that  the  path  is  not 
sufficiently  probable  and  it  searches  back  changing  decisions  at  branches  in  an 
attempt  to  find  some  path  which  remains  below  threshold  T^  .  In  this  process 
it  goes  back  as  far  as  point  a  and  failing  to  find  an  exit  below  curve  T^  ,  it 
returns  to  point  b  relaxing  the  criterion  to  threshold  Tj  .  The  original 
path  again  fails,  and  the  search  this  time  yields  the  upper  path  at  point  c 
which  proceeds  below  curve  T^  . 

This  procedure  attempts  to  make  maximum  use  of  the  law  of  large 
numbers,  i,  e.  ,  the  fact  that  the  longer  the  observed  path  the  more  probable 
the  correct  path  will  appear  and  the  less  probable  all  incorrect  paths  will 
appear.  The  parameters  dQ  and  Tq  are  adjusted  to  give  optimum  performance 
of  the  decoder.  Unlike  the  SECO  algorithm,  constraint  length  does  not  appear  in 
in  the  tree  search  procedure  nor  is  the  final  decoding  related  to  the  tree  search. 

Waiting  Lines  and  Buffer  Storage 

There  is  no  attempt  in  this  note  to  derive  any  quantitative  properties  of 
sequential  decoding  algorithms.  It  should  be  clear,  however,  because  of  the 
basic  structure  of  the  sequential  decoding  process  that  if  the  average  noise  level 
is  sufficiently  small,  then  large  noise  perturbations  will  be  sufficiently  rare  so 
that  the  decoder  is  on  the  correct  path  most  of  the  time.  The  greater  the  noise 
level,  the  more  common  are  the  large  perturbations  and  the  more  likely  it  is 
for  the  decoder  to  make  false  starts  on  incorrect  paths  and  require  searches 
to  find  the  correct  path.  Wozencraft  has  shown  that  for  rates  below  some 
maximum  called  Rcomp  »  the  average  number  of  branches  observed  by  the 


20 


decoder  remains  finite  (For  the  binary  symmetric  channel,  1/2  CiR  s  C 

comp 

where  C  is  the  channel  capacity. ) 

However,  even  with  a  small  average  number  of  computations,  there 
are  individual  times  during  which  the  peak  computational  load  may  become 
large.  It  seems  reasonable,  therefore,  to  design  a  decoding  machine  which 
can  handle  the  average  computational  load  with  a  buffer  storage  to  smooth  out 
the  computational  peaks.  Suppose  that  the  decoder  contains  storage  sufficient 
for  N  branches.  We  can  represent  this  as  in  Fig.  13  by  superimposing  a 
window  N  branches  long  on  a  decoding  tree.  The  storage  represented  by  the 
window  is  used  for  a  variety  of  purposes  related  to  the  execution  of  the  decoding 
algorithm.  In  Fig.  13a,  a  number  of  nodes  in  the  tree  have  been  designated. 
Nodes  Pj  and  are  respectively  the  oldest  and  newest  nodes  in  the  window. 

Pj.  is  the  position  of  the  latest  branch  to  enter  the  decoder  from  the  channel; 

P^  is  the  current  node  under  examination.  P^  is  the  farthest  point  to  the 
right  to  which  P^  has  advanced  and  P^  is  the  farthest  point  to  the  left  to  which 
P^  can  go  or,  in  other  words,  the  oldest  node  which  can  still  be  examined. 

The  distance  between  Pj.  and  either  P3  or  P^  is  indicative  of  the  waiting 
line,  while  the  distance  between  P 2  and  P^  is  some  fixed  delay  between  an 
irrevocable  decision  and  release  of  data.  The  entire  window  (hence  points  Pj, 
P2  and  P^  )  moves  to  the  right  at  the  rate  at  which  digits  are  decoded  or 
passed  on  to  the  user.  P,.  moves  to  the  right  at  the  rate  at  which  digits  are 
received  from  the  channel. 

In  Fig.  13b,  the  SECO  configuration  is  shown.  The  distance  between 

P^  and  P^  is  fixed  at  the  constraint  length  k  .  The  distance  between  P^  and 

P2  is  the  length  l  of  the  path  being  examined  (the  abscissa  of  Fig.  12).  The 

waiting  line  is  defined  to  be  the  distance  between  P^  and  P^  .  A  digit  is 

decoded  when  the  k-branch  path  between  P^  and  P^  looks  "sufficiently 

probable".  Thus  the  entire  window  moves  to  the  right  at  the  decoding  rate 

while  P;.  moves  to  the  right  at  the  uniform  channel  rate.  When  P^  reaches 

P ,  the  buffer  overflows  and  the  decoder  must  stop, 
o 

The  Fano  decoder  is  represented  in  Fig.  13c.  Here  we  may  consider 

P  and  P,  to  coincide.  Thus  the  decoding  rate  and  the  channel  rate  are 
5  o 

identical  and  quite  independent  of  the  decoder  or  tree  search  operation.  P^ 
moves  at  the  tree -search  rate.  Hence  the  buffer  overflows  when  P^  reaches  its 
left-most  point  P^  .  The  distance  between  P^  and  P^  is  the  length  I  of 
Fig.  4  and  the  delay  d  between  Pj  and  P^  may  be  interpreted  as  the 

minimum  allowable  value  of  I  at  which  a  decoding  may  occur. 
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DECODING  RATE 


b.  SECO 


Fig.  13.  Decoder  storage. 
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In  either  algorithm,  once  moves  past  a  node,  the  branch  chosen 
at  that  node  can  no  longer  be  changed.  The  delay  d  between  P£  and  the  final 
release  at  P^  is  provided  for  error  detection.  If  the  probability  of  an 
incorrect  branching  at  P^  is  not  sufficiently  low,  the  delay  provides  additional 
time  for  detecting  a  possible  error  even  though  that  error  may  no  longer  be 
corrected.  In  the  SECO  machine  the  delay  is  fixed  at  k,  the  constraint 
length.  This  restriction  is  not  fundamental  to  the  Wozencraft  algorithm. 

Probability  of  Error 

According  to  the  previous  section,  a  digit  is  releasedwhen  the  branch 
corresponding  to  that  decision  leaves  the  decoder,  or  equivalently  when  the 
N  branch  window  defining  the  decoder  storage  passes  the  branch.  Thus  a 
decoding  error  is  made  if  an  incorrect  branch  is  passed  by  the  window. 

During  the  process  of  searching  the  tree,  a  noise  event  often  causes 
an  incorrect  branch  to  look  '‘sufficiently  probable".  A  succession  of  noise 
events  can  make  a  succession  of  incorrect  branches  or  an  incorrect  path  look 
"sufficiently  probable".  As  the  decoder  proceeds  along  an  incorrect  path  one 
of  two  results  must  occur:  either  the  decoder  remains  on  an  incorrect  path 
in  which  case  it  must  eventually  look  improbable  {law  of  large  numbers),  or  the 
decoder  somehow  stumbles  back  on  the  correct  path  and  effectively  returns  to 
normal.  The  meaning  of  the  latter  is  best  explained  with  reference  to  the  tree 
of  Fig.  10.  The  tree  structure  repeats  every  k  branches.  Thus,  for  example, 
the  tree  beginning  with  the  branch  corresponding  to  message  bit  is 

independent  of  the  choice  of  m^  and  earlier  message  bits  since  k  =  4. 
Consequently  it  is  possible  for  a  noise  sequence  to  force  an  incorrect  branching 
at  m^  leading  to  the  same  branching  at  that  would  have  been  chosen  had 

the  correct  branch  at  m^  been  taken. 

There  are,  therefore,  two  types  of  decoding  errors:  type  (1),  an 
incorrect  branching  that  continues  to  look  probable  and  results  in  the  decoder 
correcting  itself  and  type  (2),  an  incorrect  branching  which  looks  correct  for 
a  long  enough  time  so  that  the  error  is  passed  on  before  it  can  be  corrected. 

With  these  remarks  in  mind,  let  us  consider  decoding  errors  in  the 
SECO  algorithm  with  reference  to  Fig.  13b.  Suppose  that  the  path  up  to  node 
P  is  correct  and  that  an  incorrect  branch  is  taken  at  P.  This  incorrect 
branch  cannot  be  corrected  (reaches  P^)  if  the  subsequent  noise  is  such  that 
P^  or  P^  advances  k  branches  beyond  P  and  the  path  looks  "sufficiently 
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probable.  "  The  two  error  mechanisms  are  indistinguishable  here  since  only 
k  successful  branches  are  required  for  a  decoding. 

Once  has  passed  an  incorrect  branch  at  P  ,  the  best  that  can  be 
achieved  is  detection  of  the  error.  A  type  1  error  is  undetectable;  a  type  2 
error  may  be  detected  before  being  passed  by  Pj  by  virtue  of  the  law  of 
large  numbers.  One  manifestation  of  this  is  an  increase  in  the  amount  of 
searching  or  equivalently  a  decrease  in  the  average  rate  at  which  the  window 
moves  to  the  right.  If  this  is  sufficiently  slow  P^  will  reach  P^  ,  or  the 
buffer  will  overflow,  before  P  is  passed  by  P^. 

Another  mechanism  for  error  detection  involves  placing  a  confidence 
measure  on  the  decoding  criteria.  In  the  SECO  procedure  each  criterion  curve 
K.  has  associated  with  it  a  number  P^  which  is  indicative  of  the  probability 
of  an  incorrect  branch  being  passed  by  P^.  The  basic  decoding  procedure 
described  above  permits  the  decoder  to  use  successively  higher  criterion 
curves  (higher  P^)  as  the  noise  level  increases,  resulting  in  higher  decoding 
error  probabilities.  An  obvious  detection  procedure  might  then  be  to 
establish  a  maximum  curve.  A  still  more  effective  procedure  is  indicated  in 
Fig.  14.  Here  we  plot  the  decoding  distance  (over  a  constraint  length)  as  a 
function  of  node  number.  In  curve  (a)  the  first  (oldest)  branch  under 
consideration  is  correct;  in  curve  (b)  it  is  incorrect.  The  important  fact  is 
that  it  is  possible  to  find  a  threshold  below  which  the  distance  remains  most 
of  the  time  when  on  the  correct  path  and  above  which  the  distance  remains  most 
of  the  time  when  on  an  incorrect  path.  Use  of  this  threshold  is  preferable  to 
fixing  a  maximum  criterion  curve  since  it  allows  excursions  of  the  distance 
beyond  the  threshold  provided  these  excursions  are  short  enough.  It  thus  allows 
the  decoder  to  correct  small  error  bursts  that  lead  to  high  criterion  curves 
but  which  can  be  subsequently  handled  by  the  decoder. 

In  the  Fano  decoder,  the  two  types  of  errors  are  generally 
distinguishable.  If  the  buffer  is  infinitely  large  so  that  the  decoder  can  go  back 
infinitely  far  to  correct  an  incorrect  decision,  then  type  (2)  errors  will  never 
occur  since  an  incorrect  path  must  eventually  look  improbable.  Type  (1)  errors, 
of  course,  can  never  be  detected.  With  a  finite  buffer,  type  (2)  errors  can  occur. 
Suppose  that  in  Fig.  13c  an  incorrect  branch  is  taken  at  point  P.  The  decoder 
(point  P^)  proceeds  to  the  right  from  P  along  an  incorrect  path.  If  the  noise 
sequence  is  such  that  the  P^  passes  point  P  before  P^  returns  to  P  to 
correct  the  incorrect  branching,  then  that  error  cannot  be  corrected.  If  P  is 
later  passed  by  Pj  before  P^  retreats  to  P^  then  that  error  is  passed  on 
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to  the  user.  Such  errors  may  be  detected  with  very  high  probability  if  the 
delay  period  between  Pj  and  P ^  is  sufficiently  large  or  by  the  application 
of  a  distance  test  similar  to  that  of  Fig.  14. 
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